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Abstract — A predictive Bayesian model selection 
approach is presented to discriminate coupled mod- 
els used to predict an unobserved quantity of inter- 
est (Qol). The need for accurate predictions arises 
in a variety of critical applications such as climate, 
aerospace and defense. A model problem is intro- 
duced to study the prediction yielded by the cou- 
pling of two physics/sub-components. For each single 
physics domain, a set of model classes and a set of sen- 
sor observations are available. A goal-oriented algo- 
rithm using a predictive approach to Bayesian model 
selection is then used to select the combination of 
single physics models that best predict the Qol. It 
is shown that the best coupled model for prediction 
is the one that provides the most robust predictive 
distribution for the Qol. 

Keywords: Predictive Model Selection, Quantity of In- 
terest, Model Validation, Decision Making, Bayesian 
Analysis 

1 Introduction 

With the exponential growth of available computing 
power and the continued development of advanced nu- 
merical algorithms, computational science has undergone 
a revolution in which computer models are used to simu- 
late increasingly complex phenomena. Additionally, such 
simulations are guiding critical decisions that affect our 
welfare and security, such as climate change, performance 
of energy and defense systems and the biology of dis- 
eases. Reliable predictions of such complex physical sys- 
tems requires sophisticated mathematical models of the 
physical phenomena involved. But also required is a sys- 
tematic, comprehensive treatment of the calibration and 
validation of the models, as well as the quantification 
of the uncertainties inherent in such models. While re- 
cently some attention has been paid to the propagation 
of uncertainty, considerably less attention has been paid 
to the validation of these complex, multiphysics models. 
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This becomes particularly challenging when the quantity 
of interest (Qol) cannot be directly measured, and the 
comparison of model predictions with real data is not pos- 
sible. Such Qols may be the catastrophic failure of the 
thermal protection system of a space shuttle reentering 
the atmosphere or the different performance characteris- 
tics of nuclear weapons in order to maintain the nuclear 
stockpile without undergoing underground nuclear test- 
ing. 

In this paper, we present an intuitive interpretation of the 
predictive model selection in the context of Bayesian anal- 
ysis. While the predictive model selection is not an new 
idea, see Refs.[31l31[Z]j here we emphasize the connection 
between the Qol-aware evidence and the Bayesian model 
averaging used for estimation. This new interpretation 
of the Bayesian predictive model selection reveals that 
the best model for prediction is the one which provides 
the most robust predictive probability density function 
(pdf) for the Qol. Also, the latest advances in Markov 
Chain Monte Carlo (MCMC) 2] and estimators based 
on the fc-nearest neighbor [8] are used to compute the 
information theoretic measures required in the problem 
of predictive selection of coupled models. It is further 
argued that equivalence between predictive model selec- 
tion and conventional Bayesian model selection can be 
reached by performing optimal experimental design [H] 
for model discrimination. The structure of the paper is 
as follows: first the selection problem of coupled models 
is stated in Section [2j The conventional Bayesian model 
selection is described in Section [3] and the extension to 
Qol-aware evidence is derived in Section |4] The model 
problem and numerical results are presented in Section 
[5] and Section |6] respectively. The conclusions and future 
work are discussed in Section [71 

2 Problem Statement 

Here we are interested in the prediction of a coupled 
model. The problem of selecting the best coupled model 
in the context of the Qol is to find the combination of sin- 
gle physics models that best predict an unobserved Qol 
in some sense, see Fig. [T] Thus at the single physics 
level, we have two physics A and B, each with a model 



class set, M^ and M^ , and a set of observations D^ 
and D^ respectively. The cardinality of the two sets of 
model classes are jA^^^I — Ka, \M^\ — Kb, and the the 
definition of model classes in each set is given by the state 
equations and measurement models as follows, 
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Figure 1: Predictive Selection for Coupled Models 

All the possible couplings of single physics models 
yield the set of coupled model classes A4 = {Mij — 



{M( 



,Mf)|M/^ e M' 



,A/f 



e AA } with cardinality 



\M\ = KaKb- The definition of a coupled model class in 
this set is given by the state and measurement equation, 
and in addition we also have the model for the Qol, 



M,. 
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Having the set of all the coupled models, the selection 
problem becomes finding the best coupled model in the 
set, Jv[, for prediction purposes. 

3 Bayesian Model Selection 

In the context of A^-closed perspective, the conventional 
Bayesian approach to model selection is to choose the 
model which has the highest posterior plausibility. 



M* 



arg max 7r(M|_D, A^). 



(1) 



Given the data at the single physics level, one can com- 
pute the posterior model plausibility for all the models in 
the M."^ and M^ sets, as the product between evidence 
and prior plausibility. 



TT(Mf\D^,M) ex TT{D^\Mf,M)n{Mf\M). 



(2) 



The evidence is obtained during the calibration process 
for each single-physics models, and it is given by the nor- 
malization constant in the Bayes rule, used to compute 
the posterior pdf of model parameters. 



7r(6> 



A^j^A^ 



MfM) 



n{D^\et,M^,MMet\M^,M) 

tt{DA\M^,M) 



With no data at the coupled level, one can easily obtained 
the posterior plausibility for coupled models as, 



n{A'hj\D,M) 



TT{M^^\D^,M)TT{Mf\D^,M). 



(3) 



Notice, that the best coupled model is given by coupling 
the best models at the single physics level. Computa- 
tionally this is advantageous as there is no need to build 
all the possible coupled models using the single physics 
models. Looking at just two coupled models, we can say 
that we prefer model Mi over model M2 if and only if 
tt{Mi\D,M) > 7r{M2\D,M). This inequality can be re- 
casted as the following product of ratios: 



7r(D|Afi) n{Mi,M) 
Tr{D\M2) tt{M2,M) 

Baycs factor prior odds 



> 1 



(4) 



The log-evidence is the trade-off between model complex- 
ity and how well the model fits the data. In other words 
the evidence yields the best model that obeys the law of 
parsimony. 



ln[7r(D|A/i)] =E [ln[TT{D\e, Mi) 



KL 7r(0|D,Afi) II 7r(6»|Mi) 



(5) 



where the model complexity is given by the KuUback- 
Leibler divergence between posterior pdf and prior pdf 
of model parameters [2] . Therefore, this model selection 
scheme makes use of the following information in choos- 
ing the best model: model complexity, data fit, and prior 
knowledge. Since it obeys Occam's razor, we implicitly 
gain some robustness with respect to predictions. How- 
ever, if we have two different Qols that we would like to 
predict, it is not obvious if the model selected under this 
scheme will be able to provide equally good predictions 
for both Qols. This is due to the fact that the informa- 
tion about the Qol is not explicitly used in the selection 
criterion. In the following sections, we will present an 
extension of model selection scheme to also account for 
the Qol, and discuss its implications. 

4 Predictive Model Selection 

Given a model class set Ai — {Mi, M2, . . . , Mk} (for 
simplicity the double index will be ignored in the 
model class notation), and a set of observations D — 
{di, d2, . . . , d„} generated by an unknown model from 
A4, we are concerned with the problem of selecting the 
best model to predict an unobserved quantity of inter- 
est q. The selection of the model which best estimates 



the pdf of the Qol is seen here as a decision problem [7] . 
First one has to find the predictive distribution for each 
model class and the selection of the best model class is 
based on the utility of its predictive distribution. Given 
all the available information, the Bayesian predictive dis- 
tribution conditioned on a model Mj is given by: 



q|L», Mj) = I TT{q\9j,D, Mj) n{ej\D, Mj) d9j (6) 



7r( 



where the posterior pdf for model parameters is computed 
using Bayes rule. In the followings it is assumed that 
the true distribution of the Qol is generated by a model 
Mj{6j) £ Mj, called the true model. Thus, the true pdf 
of the Qol can be written as: 



Tr{q\e,s,D,M) 



K 

E 



Ti{ci\0 J, D,Mj)s, 



(7) 



where 9 = [61,62, ... ,0k), s = {si,S2, ■ ■ ■ ,sk), and 
Sj = 1 if and only if the true model belongs to model 
class Mj . Let 11^(6, s, Mj ) describe the utility of choosing 
the pdf associated with model class Mj as the predictive 
distribution of the Qol, q. Here the utility function is de- 
fined as the negative KuUback-Leibler divergence of the 
true distribution and the predictive distribution of Mj : 



f/q(6>,S,M, 



KL{Tv{q\e,s, D,M) II 7r(q|D,Mj 



(8) 



The model that maximizes the following expected utility 
(Qol-aware evidence) is the model of choice for predictive 
purposes: 

Ar^argmax Uq{6,s,Mj)TT{d,s\D,M)ddds (9) 

Ee,,lU^(e,s,M,)] 

With few mathematical manipulations, the expected util- 
ity can be written as follows, 

Eg,slUq(e,s,Mj)] = f Uq(e,s,Mj)n{e\s,D,M)TT(s\D,M)deds 

K 
= ^-K{K'h\D,M) Uq{e„s„Mj)n{e,\D,M,)de, (10) 



KL 7r(q|0„D,M,)||7r(qp,Afj) 



-^7r(Af,|D,A^)£;e, 



4.1 Interpretation of the expected utility 
used in model selection 

Consider now that only two model classes exist in our 
model set A4 = {Mi,M2}- We prefer model class Mi 



over M2 and write Afi >- M2 if and only if: 

Se.s[f/q(0,s,Afi)] > Se,s[C/q(e,s,M2)] (11) 



Substituting Eq.( 10 ) into Eq.( 11 1 the following model se- 



lection criterion can be derived: 

R{Mi\\M2) TTiD\Mi) n{Mi,M) 
R{M2\\Mi) tt{D\M2) tt{M2,M) 



> 1 



(12) 



Risk ratio Baycs factor Prior odds 

where the numerator in the predictive risk ratio is given 
by the following expressions. The denominator is ob- 
tained by analogy with the numerator. 



RiMiWMi) = Eg, 



-Eoi 



KL 7r(q|6>i,D,Mi) || 7r(q|D,M2 



KL 7r(ql6>i,D,Mi) || t:{ci\D,Mi] 



(13) 



The model selection criterion in Eq.(12| can be inter 



preted as the evidence of model class Mi in favor of 
model class M2, and is composed of prior evidence given 
by the prior odds, experimental evidence given by the 
Bayes factor and the predictive risk ratio which ac- 
counts for the loss of choosing the wrong model. Ac- 
cording to Trottini and Spezzaferri [7], the expecta- 
tions in the above ratio have the following meaning: 



-Bei 



KL 7r(q|6»i,7:i,Afi)||7r(q|D,M2) 



the risk of choos- 



even if we report 



ing model class M2 when the true model belongs to Mi, 

Eg, KL(n{ci\ei,D,Mi)\\n{(i\D,Mi 

the distribution 7r(q|D, Mi) when the true model belongs 
to All : there is a risk incurred due to the unknown value 
of 61 that generated the true model. Comparing with the 
previous model selection scheme, the following informa- 
tion is used in this scheme to select the best model: Qol, 
model complexity, data fit, and prior knowledge. 



4.2 



Calculating the expected utility used in 
model selection 



The calculation of the Qol-aware evidence in Eq.(lO) is 



challenging as we are dealing with high dimensional in- 
tegrals, and the number of samples in the posterior dis- 
tributions is dependent on the MCMC algorithms and 
computational complexity of the forward model. Thus, 
we would like to simplify this calculation. Starting from 



Eq.(lO) the following expression for the expected utility 



can be obtained: 



i?e,s[t/q(0,s,M,)] = -J2^iM^D,M) f ^(g.|Z?, M.)n{ci\e,, D, M,) log "'^'^'^ n^,', 5!'^ dO^dci 
^-1 J ■K[ci\D,Mj) 



1=1 

K 



Y^'k{M,\D,M) f ■K{ci,e,\D, M,)logn{ci\0„ D, M,)de,dci+ /' 7r(q|D,M) log7r(q|D, A/,)dq (14) 



Where the predictive pdf under all models is given by, 

K 

4q|i?,X) = ^7r(M,|AA^)vr(q|AM0. (15) 



j=i 



Since the first term in Eq.(14) is the same for all models 
Mi, ior j = 1 . . .K, the optimization in Eq.(|9l is equiva- 



lent with maximizing the second term in Eq.(14), which 



is the negative cross-entropy between the predictive dis- 
tribution conditioned on all the models and the predictive 
distribution conditioned on the jth model: 



M* =arg max^-H( ^(q|Z?,A^),7r(q|i5,M,) 1 (16) 

By writing the optimization as a minimization instead of 
a maximization and subtracting the entropy of the pre- 
dictive distribution conditioned on all the models, then 
one can rewrite the model selection problem as. 



M* = arg mill H( 7r(q|D,X), 7r(q|D, M^) ) - H( 7r(q|D,X) 



: arg mill KL| 7r(q|D, A^) 



7r(q|D,M,) 



(17) 



Thus, the best model to predict an unobserved quantity 
of interest q is the one whose predictive distribution best 
approximates the predictive distribution conditioned on 
all the models. This is rather intuitive as all we can say 
about the unobserved quantity of interest is encoded in 
the predictive distribution conditioned on all the models. 
The predictive pdf under all models being the most robust 
estimate of the Qol for this problem. 

This model selection scheme reveals that when the pos- 
terior model plausibility is not able to discriminate be- 
tween the models, the prediction obtained with the se- 
lected model is the most robust prediction we can obtain 
with one model. Thus we are able to account for model 
uncertainty when predicting the Qol. On the other hand, 
in the limit, for discriminatory observations, when the 
posterior plausibility is one for one of the models, the 
two selection schemes become equivalent. 

5 Model Problem 

The model problem consists of a spring-mass-damper sys- 
tem that is driven by an external force. The spring-mass- 
damper and the forcing function are considered to be sep- 
arate physics such that the full system model consists of 
a coupling of the dynamical system modeling the spring- 
mass-damper system and a function modeling the forc- 
ing. In this model problem, synthetic data are generated 
according to a truth system. 

5.1 Models 

This section describes the models that will form the sets 
of interest for the single physics. The models of the 
spring-mass-damper system take the following form: 



The mass is assumed to be perfectly known, m — 1, 
and the damping coefhcient c is a calibration parame- 
ter. Model form uncertainty is introduced through the 
spring models k{x). Three models are considered: a lin- 
ear spring (OLS), a cubic spring (OCS), and a quintic 
spring (OQS), given by the following relations: 



koLs{x) = kifi 
kocs[x) = ks,fi + kz^2X^ 



•^OQS 



(x) 



^5,0 



^5 2a; -I- k^AX 



(19) 
(20) 
(21) 



The models of the forcing function are denoted f{t). 
Three models are considered: simple exponential decay 
(SED), oscillatory linear decay (OLD), and oscillatory 
exponential decay (OED): 



fsEoit) = Foexp(-i/r) 



(22) 



foMt) - i ^"(' - '/") ^""^r>t '^ ' ° - ' - " (23) 



foEoit) = foexp(-i/r) [asin(a;f) 4- 1] 



(24) 



The coupling of the spring-mass-damper and the forcing 
is trivial. Thus, only a single coupling model is consid- 
ered, and the coupled model is given by 



mx + ex + k{x)x = f{t). 



(25) 



There are three choices for k{x) and three choices for f{t), 
leading to nine total coupled models. 

5.2 The Truth System 

To evaluate the two selection schemes: Bayesian model 
selection, and predictive model selection, we can con- 
struct the true system which will be used to generate data 
and give the true value of the Qol. The comparison of the 
two selection criteria will be done with respect to differ- 
ent subsets of models, and the ability of the best model 
to predict the true value of the Qol. The true model is 
described by OQS-OED: 



mx + cx + koQs{x)x = foED 



(26) 



mx + ex + k(x)x = 0. 



(18) 



where to = 1, c = 0.1. The parameters for the spring 
model are set to fcs.o = 4, kz^2 = —5, and ^5^4 = 1. The 
true forcing function is given by the following values for 
the parameters: Fq = \, t = 27r, a = 0.2, uj = 2. 

The Qol of the coupled model is assumed to be the max- 
imum velocity Xmax = maxtgR+ |a;(i)|. The observable 
for physics A is given by the kinetic energy versus time: 
\x{tiY ioT i = 1, . . . ,N. Note that this contains the same 
information as the velocity except that it is ambiguous 
with respect to the sign. The observable for physics B 
is given by the force versus time: fiU) for ? = 1, . . . , M. 
In both cases simulated observations have been gener- 
ated by perturbing the deterministic predictions of the 
true model, with a log-normal multiplicative noise with 
standard deviation of 0.1. 



6 Numerical Results 

The inverse problem of calibrating the model parame- 
ters from the measurement data is solved using MCMC 
simulations. In our simulations, samples from the poste- 
rior distribution are obtained using the statistical library 
QUESO [3 E] equipped with the Hybrid Gibbs Transi- 
tional Markov Chain Monte Carlo method proposed in 
Ref. [T]. One advantage of this MCMC algorithm is 
that it provides an accurate estimate of the log-evidence 
using the adaptive thermodynamic integration. Estima- 
tors based on fc-nearest neighbor are used to compute the 



Table 1: Results Case 1 



KuUback-Leibler divergence in Eq.(17), see Appendix. 
The use of these estimators is advantageous especially 
when only samples are available to describe the underly- 
ing distributions. 

Three different scenarios are constructed to assess the 
predictive capability of the models selected using the two 
selections schemes: Bayesian model selection and pre- 
dictive model selection. All the uncertain parameters of 
the models are considered uniformly distributed and the 
model error has also been calibrated and propagated to 
the Qol. 

First, all the models are included in the two sets, in- 
cluding the components used to generate the true model. 
For oscillators the model class set is given by A^'^ = 
{MoLS,Mocs,MoQs} and for forcing function M^ — 
{Msed,Mold,Moed}- a number of 10 measurements 
have been considered for the oscillators and 61 for the 
forcing. Table [T] summarizes the results obtained after 
applying the two approaches. On the first column and 
the first row, under each model one can find the model 
plausibility after calibration, and the first number in a 
cell gives the plausibility of the corresponding coupled 
model. The number in the parenthesis is the KL diver- 
gence used in the predictive model selection. In this case, 
the observations provided are enough to discriminate the 
models at single physics level, oscillator OQS and the 
forcing OED are selected in this case. Here, the pre- 
dictive model selection is consistent with the plausibility 
based model selection. Notice that the true model be- 
longs to the model class OQS-OED, and the prediction 
of the selected model covers the true value of the Qol, see 
Figj2^. One computational advantage is that when dis- 
criminatory observations are available, one does not need 
to carry the analysis on all the coupled models, just on 
the coupling of the best single physics ones and still be in 
agreement with predictive requirements. We argue that 
for very complex and hierarchical systems, with multiple 
levels of coupling, such a situation should be preferred 
and exploited by designing experiments to collect mea- 
surements intended to discriminate models. 

For the second scenario we remove the forcing that gen- 
erated the true model. Now the sets of model classes 
are given by M"^ = {MolS:Mocs,Moqs} and M^ = 
{Msed,Mold}- The same number of observations are 
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1.00 
(0.00) 



considered for the oscillators and 7 measurements for the 
forcing models. The results are presented in Table [2] As 
before, we are able to discriminate the oscillators, how- 
ever we cannot say the same for the forcing models. This 
can be seen in Fig. [2fc, where we can see the predic- 
tion of the observable with the two forcing models after 
calibration. The data supports almost equally well both 
forcing functions. In this case the two selection schemes 
yield two different models: OQS-OLD for the Bayesian 
model selection and OQS-SED for the predictive model 
selection, see Table [2j Looking at their predictions for 
the Qol, Fig. ^p, we see that the pdf provided by the 
model selected using conventional Bayesian model selec- 
tion doesn't even cover the true value of the Qol, whereas 
the one chosen by the predictive selection scheme covers 
the true value of the Qol in the tail. Thus, while the plau- 
sibility based selection ignores the model uncertainty, the 
predictive selection approach yields the model with the 
most robust predictive pdf for the Qol. This prediction 
incorporates as much as possible model uncertainty that 
one can obtain with just one model. Therefore, the pre- 
dictive approach is recommended in the case when dis- 
criminatory observations are not available and one model 
has to be chosen instead of model averaging, especially 
for complex hierarchical systems. 
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Table 3: Results 


5 Case 3 
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0.48 


OLD 

0.52 
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ocs 
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OQS* 
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(1.64) 











Lastly, we are not including in the model sets any of 
the components that generated the true model. Thus, 
X^ = {MoLS,Mocs} and M^ = {Msed,Mold}. In 
this case only 5 observations are considered for the oscil- 
lators and 4 for the forcing functions. We can see from 
Table [3] that at the single physics level we are not able 
to discriminate the oscillators or the forcing functions. 
For the coupled models both approaches choose the same 
model OCS-OLD, however looking at the predictions of 



all coupled models, including their average prediction we 
see that all of them give very low likelihood for the true 
value of the Qol. This case emphasizes that the predic- 
tion approach has the same drawback as the Bayesian 
model selection and Bayesian model averaging. Mainly, 
we are at the mercy of our hypotheses and the only way 
to escape this case is to generate additional hypotheses. 



where dx is the dimensionality of the random vari- 
able X, Nn and Nn^i give the number of samples 
{X;|i = l,...,iV„} ^ p{x\D„) and {Xi_,\j = 
1, . . . ,-/V„„i} ^ p{x\Dn-i) respectively, and the two 
distances i>n-i{i) and Pnii) are defined as follows: 
Pnii) = minj=i...Ar^ j^i ||X;, 
niin,-=i...w _j ||X\ - Xi^_-,\\oo 



X^lloo and Vn-iii) 
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Figure 2: Results for the three cases considered 

7 Conclusions 

In this paper, we have presented a model selection 
criterion that accounts for the predictive capability of 
coupled models. This is especially useful when com- 
plex/ multiphysics models are used to calculate a quantity 
of interest. It has been shown that the prediction obtain 
with the model chosen by the predictive approach, is the 
most robust prediction that one can obtain with just one 
model. This is because in part it incorporates model un- 
certainty, while conventional Bayesian model selection ig- 
nores it. For discriminatory measurements the Bayesian 
model selection and predictive model selection are equiv- 
alent. This suggests that when additional data collection 
is possible then designing experiments for model discrimi- 
nation is computationally preferred for complex models. 

Appendix 

The approximation of the KuUback-Leibler divergence is 
based on a fc- nearest neighbor approach [8]. 
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